Introduction

Knowledge of the fine-level population distribution is vital for measuring the impacts of socio-economic justice, optimizing resource allocation, and assessing the risks of environmental exposures. The traditional way to get population data is from the census bureau. However, population data released by the census bureau are too coarse and irregular to satisfy the requirements of accurate analysis. To solve the irregular shape and too coarse level problem of census data, we would downscale the census block group level data to census block. As for the method, the dasymetric method would be applied. The dasymetric method uses locality knowledge to depict the uneven population distribution within zones without assuming even distributions. This volume preserve method disaggregates the population within the source zone to the target zones according to the weighted layer generated by population-related ancillary information to preserve the total population count. The dasymetric method has been evaluated as a practical and feasible way for population disaggregation. Finally, the result will be evaluated by comparing it with census block data. This project aims to apply the dasymetric method to disaggregate census data to get fine-level population data using open-source data.

Materials and methods

Load request packages for this project.

library(sf)
library(leaflet)
library(piggyback)
library(tmap) 
library(lwgeom)
knitr::opts_chunk$set(cache=TRUE)  # cache the results for quick compiling

Data preprocessing

Download data

dataurl = "https://github.com/geo511-2022/final_project-SuiyuanWang/releases/download/v1.0.4/Data_GEO511_project.zip"

#tdir=tempdir()
#download.file(dataurl,destfile=file.path(tdir,"temp.zip"))
#unzip(file.path(tdir,"temp.zip"),exdir = tdir) #unzip the compressed folder

download.file(dataurl,"temp.zip")
unzip("temp.zip") #unzip the compressed folder

Load data.

  • Buffalo zoning data.
  • Census boundaries (2020) with population.
  • Census block population_block
  • Census block group population_bg
  • Microsoft building footprint (2018).
#landuse <- st_read("Data_GEO511_project/zoning.shp") 
source_geom <- st_read("Data_GEO511_project/census_bg.shp")
## Reading layer `census_bg' from data source 
##   `/__w/final_project-SuiyuanWang/final_project-SuiyuanWang/Data_GEO511_project/census_bg.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 290 features and 15 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -78.91425 ymin: 42.82602 xmax: -78.79516 ymax: 42.96646
## Geodetic CRS:  NAD83
target_geom <- st_read("Data_GEO511_project/census_block.shp")
## Reading layer `census_block' from data source 
##   `/__w/final_project-SuiyuanWang/final_project-SuiyuanWang/Data_GEO511_project/census_block.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3141 features and 8 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -78.91425 ymin: 42.82602 xmax: -78.79516 ymax: 42.96646
## Geodetic CRS:  NAD83
residential_bf <- st_read("Data_GEO511_project/residential_bf.shp")
## Reading layer `residential_bf' from data source 
##   `/__w/final_project-SuiyuanWang/final_project-SuiyuanWang/Data_GEO511_project/residential_bf.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 70158 features and 7 fields
## Geometry type: POLYGON
## Dimension:     XYZ
## Bounding box:  xmin: -78.90966 ymin: 42.83212 xmax: -78.79883 ymax: 42.96502
## z_range:       zmin: 0 zmax: 0
## Geodetic CRS:  WGS 84
#st_crs(residential_bf)
#st_crs(target_geom)
#st_crs(source_geom)
#st_crs(landuse)

Clean required data

For building footprint data, we first filter the data using zoning data that has residential property. Then, make the source data, target data and ancillary data have the consistent crs. Find zoning codes related to residential:

### Residential zonings.The provided 'residential_bf.shp' already after filter using ArcGIS pro.
#residential_code = c('N-2R', 'N-3R', 'N-4-30', 'N-4-50', 'D-R')
#residential <- landuse %>% filter(gcode == residential_code) 
#residential_wgs84  <- residential %>%  st_transform(4326) 

# set projection for geometries.
target_geom_wgs84  <- target_geom %>%  st_transform(4326) 
source_geom_wgs84  <- source_geom %>%  st_transform(4326) 

Disaggregate population

The data was downscaled from the source zone (census block group) to the target zone (census block) using building footprint areas in the target zone as ancillary data. After downscaling, we would convert data as an integer andNA as 0, due to the population number can only be an integer and larger than or equal to zero.

# dasymetric map with building footprint information as ancillary data
sf_use_s2(FALSE)

dm_pop = dasymetric_map(target_geom_wgs84, source_geom_wgs84, residential_bf, extensive = "population")

# convert estimation result to integer, and replace NA as 0.
dm_pop[is.na(dm_pop)] <- 0 
dm_pop$population = as.integer(dm_pop$population)

Results

Evaluating the result by plotting ground truth data and estimation results. Then, calculate R2 and coefficients. Refer To

#calculate r2 in dm_pop, 'population'(estimate) & 'Population'(ground truth)
mod1 = lm(Population~population, data = dm_pop)
modsum = summary(mod1)
plot(dm_pop$Population, dm_pop$population, type = 'p', las = 1,
        xlab = expression(paste('Census block Population')),
        ylab = 'Estimate population')

#adding the regression line from the linear model
abline(mod1)

r2 = modsum$adj.r.squared
print(paste("R2 = ", r2))
## [1] "R2 =  0.619672830387344"
modsum$coefficients
##               Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 18.3236307 1.42747810 12.83637 8.503308e-37
## population   0.8080783 0.01129649 71.53356 0.000000e+00

We are using tmap to show the ground truth and estimation result in an interactive map. By exploring the map, we can easily find the outlier’s location and discuss the reason.

tm_shape(dm_pop) +
  tm_fill(col = c("population", "Population"), 
          style = "jenks", 
          title = c("Estimate_Population", "True_Population")) + 
  tm_facets(ncol = 2, sync = TRUE) +
  tmap_mode("view")
## tmap mode set to interactive viewing

Conclusions

This project aims to apply the dasymetric method to disaggregate census data to get fine-level population data using open-source data. The results show that R2 for this research is 0.62. We can see the potential by only using building footprint as ancillary data to downscaling population. It shows that population distribution has a high correlation with residential building distribution. However, there do have some outliers. The blocks that have a high population tend to be underestimated, and low-population blocks tend to be overestimated. We find the errors are caused by the following reasons: * Inherent error in zoning data. The building classification used zoning data. The buildings in zoning that didn’t have residential property will be weighted as 0, and not input into the dasymetric function for final estimation. So, if the block doesn’t have residential zoning, it will be allocated 0 population. * Blocks have other residential usage units. The other reason is that the block has mixing usage buildings, like commercial and residential mixing buildings cannot be identified. * Buildings have many floors. The building footprint can only know the one-floor area of the buildings. However, a high population density area always has high buildings to occupy more population. For the problems mentioned above, we can try to use more accurate data as ancillary data to weight the population distribution. The zoning data can be crossed verification or replaced using more specific data like tax parcels. As for high buildings, we can first use LiDAR data to find the building height and then use average floor height to get the floor number for buildings. As for the mixing usages of buildings, after we know the building height, we can use points of interest from Google or OpenStreetMap to calculate the percentage of commercial floors in one building.

References

Bakillah, M., et al. (2014). “Fine-resolution population mapping using OpenStreetMap points-of-interest.” International Journal of Geographical Information Science 28(9): 1940-1963. Frantz, D., et al. (2021). “National-scale mapping of building height using Sentinel-1 and Sentinel-2 time series.” Remote Sensing of Environment 252: 112128. Huang, X., et al. (2021). “A 100 m population grid in the CONUS by disaggregating census data with open-source Microsoft building footprints.” Big earth data 5(1): 112-133. Huang, X., et al. (2019). High-resolution population grid in the CONUS using microsoft building footprints: A feasibility study. Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities. Maantay, J. A., et al. (2007). “Mapping population distribution in the urban environment: The cadastral-based expert dasymetric system (CEDS).” Cartography and Geographic Information Science 34(2): 77-102. Mennis, J. (2009). “Dasymetric mapping for estimating population in small areas.” Geography Compass 3(2): 727-745. Mennis, J. and T. Hultgren (2006). “Intelligent dasymetric mapping and its application to areal interpolation.” Cartography and Geographic Information Science 33(3): 179-194. Qiu, Y., et al. (2022). “Disaggregating population data for assessing progress of SDGs: methods and applications.” 15(1): 2-29.